177 research outputs found

    Blind dereverberation of speech from moving and stationary speakers using sequential Monte Carlo methods

    Get PDF
    Speech signals radiated in confined spaces are subject to reverberation due to reflections of surrounding walls and obstacles. Reverberation leads to severe degradation of speech intelligibility and can be prohibitive for applications where speech is digitally recorded, such as audio conferencing or hearing aids. Dereverberation of speech is therefore an important field in speech enhancement. Driven by consumer demand, blind speech dereverberation has become a popular field in the research community and has led to many interesting approaches in the literature. However, most existing methods are dictated by their underlying models and hence suffer from assumptions that constrain the approaches to specific subproblems of blind speech dereverberation. For example, many approaches limit the dereverberation to voiced speech sounds, leading to poor results for unvoiced speech. Few approaches tackle single-sensor blind speech dereverberation, and only a very limited subset allows for dereverberation of speech from moving speakers. Therefore, the aim of this dissertation is the development of a flexible and extendible framework for blind speech dereverberation accommodating different speech sound types, single- or multiple sensor as well as stationary and moving speakers. Bayesian methods benefit from – rather than being dictated by – appropriate model choices. Therefore, the problem of blind speech dereverberation is considered from a Bayesian perspective in this thesis. A generic sequential Monte Carlo approach accommodating a multitude of models for the speech production mechanism and room transfer function is consequently derived. In this approach both the anechoic source signal and reverberant channel are estimated using their optimal estimators by means of Rao-Blackwellisation of the state-space of unknown variables. The remaining model parameters are estimated using sequential importance resampling. The proposed approach is implemented for two different speech production models for stationary speakers, demonstrating substantial reduction in reverberation for both unvoiced and voiced speech sounds. Furthermore, the channel model is extended to facilitate blind dereverberation of speech from moving speakers. Due to the structure of measurement model, single- as well as multi-microphone processing is facilitated, accommodating physically constrained scenarios where only a single sensor can be used as well as allowing for the exploitation of spatial diversity in scenarios where the physical size of microphone arrays is of no concern. This dissertation is concluded with a survey of possible directions for future research, including the use of switching Markov source models, joint target tracking and enhancement, as well as an extension to subband processing for improved computational efficiency

    End-to-End Classification of Reverberant Rooms using DNNs

    Get PDF
    Reverberation is present in our workplaces, our homes and even in places designed as auditoria, such as concert halls and theatres. This work investigates how deep learning can use the effect of reverberation on speech to classify a recording in terms of the room in which it was recorded in. Approaches previously taken in the literature for the task relied on handpicked acoustic parameters as features used by classifiers. Estimating the values of these parameters from reverberant speech involves estimation errors, inevitably impacting the classification accuracy. This paper shows how DNNs can perform the classification in an end-to-end fashion, therefore by operating directly on reverberant speech. Based on the above, a method for the training of generalisable DNN classifiers and a DNN architecture for the task are proposed. A study is also made on the relationship between feature-maps derived by DNNs and acoustic parameters that describe known properties of reverberation. In the experiments shown, AIRs are used that were measured in 7 real rooms. The classification accuracy of DNNs is compared between the case of having access to the AIRs and the case of having access only to the reverberant speech recorded in the same rooms. The experiments show that with access to the AIRs a DNN achieves an accuracy of 99.1% and with access only to reverberant speech, the proposed DNN achieves an accuracy of 86.9%. The experiments replicate the testing procedure used in previous work, which relied on handpicked acoustic parameters, allowing the direct evaluation of the benefit of using deep learning.Comment: Submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processin

    Measuring work and workers: Wearables and digital assistance systems in manufacturing and logistics

    Full text link
    The smart glove or smart data glasses: Digitalization of work means that technology is moving closer to the bodies of employees. It can make movements, vital signs and even emotions visible. Technologies which many people use privately to monitor their sporting activities or health opens up a new dimension of control in the workplace, but also the possibility of supporting employees in complex work processes. Based on case studies of companies in manufacturing and logistics as well as a survey of employees, this study provides insights into operational use cases of wearables and the assessments of employees. It reveals contradictory experiences and a high importance of co-determination and co-design of new technologies by employees and works councils as a condition for using new technologies for improving work quality

    Signal compaction using polynomial EVD for spherical array processing with applications

    Get PDF
    Multi-channel signals captured by spatially separated sensors often contain a high level of data redundancy. A compact signal representation enables more efficient storage and processing, which has been exploited for data compression, noise reduction, and speech and image coding. This paper focuses on the compact representation of speech signals acquired by spherical microphone arrays. A polynomial matrix eigenvalue decomposition (PEVD) can spatially decorrelate signals over a range of time lags and is known to achieve optimum multi-channel data compaction. However, the complexity of PEVD algorithms scales at best cubically with the number of channel signals, e.g., the number of microphones comprised in a spherical array used for processing. In contrast, the spherical harmonic transform (SHT) provides a compact spatial representation of the 3-dimensional sound field measured by spherical microphone arrays, referred to as eigenbeam signals, at a cost that rises only quadratically with the number of microphones. Yet, the SHT’s spatially orthogonal basis functions cannot completely decorrelate sound field components over a range of time lags. In this work, we propose to exploit the compact representation offered by the SHT to reduce the number of channels used for subsequent PEVD processing. In the proposed framework for signal representation, we show that the diagonality factor improves by up to 7 dB over the microphone signal representation with a significantly lower computation cost. Moreover, when applying this framework to speech enhancement and source separation, the proposed method improves metrics known as short-time objective intelligibility (STOI) and source-to-distortion ratio (SDR) by up to 0.2 and 20 dB, respectively

    Ausbau der Stromnetze im Rahmen der Energiewende. Stakeholder Panel TA

    Get PDF
    Dieser Bericht stellt das Stakeholder Panel TA sowie die Ergebnisse der ersten im Rahmen des Panels durchgeführten Onlinebefragung »Ausbau der Stromnetze im Rahmen der Energiewende« vor, die vom 26. November 2014 bis zum 15. Januar 2015 durchgeführt wurde. Die Ergebnisse zeigen, dass es eine deutliche Zustimmung in allen Stakeholdergruppen zu den zentralen Zielen der Energiewende gibt. Der Reduzierung der Verwendung fossiler Energieträger und dem zielgerichteten Ausbau der erneuerbaren Energien stimmen ca. drei Viertel der Befragten zu. Demgegenüber wird der Ausbau der Stromnetze von einer Mehrheit kritisch beurteilt. Diese kritische Haltung dem Netzausbau gegenüber beeinflusst auch die Beurteilung der von der Bundesregierung beschlossenen Energiewende insgesamt

    Neue elektronische Medien und Gefahrenpotenziale exzessiver Nutzung. Stakeholder Panel TA

    Get PDF
    Der vorliegende Report stellt die Ergebnisse der Onlinebefragung »Neue elektronische Medien und Gefahrenpotenziale exzessiver Nutzung« vor, die vom 12. Mai 2015 bis zum 31. Juli 2015 durchgeführt wurde. Er ergänzt den TAB-Arbeitsbericht Nr. 166 »Neue elektronische Medien und Suchtverhalten«, in dem bereits eine Zusammenfassung der Onlinebefragungsergebnisse enthalten ist, und dokumentiert u.a. die Auswertungen der Kommentare von Befragungsteilnehmenden. Der Report führt die Publikationsreihe zum Stakeholder Panel TA fort

    Online-Bürgerbeteiligung an der Parlamentsarbeit. Stakeholder Panel TA

    Get PDF
    Der vorliegende Report dokumentiert die Ergebnisse der Onlinebefragung »Online-Bürgerbeteiligung an der Parlamentsarbeit«, die vom 10. September 2015 bis zum 2. November 2015 durchgeführt wurde. Die mittlerweile dritte Befragungswelle des Stakeholder Panel TA befasste sich mit Einschätzungen und Erfahrungen gesellschaftlicher Stakeholder zu Angeboten der Online-Bürgerbeteiligung beim Deutschen Bundestag. Im Mittelpunkt standen das Interesse an Beteiligungsangeboten sowie Faktoren, die zur Nutzung motivieren bzw. dieser entgegenstehen. Außerdem wurden Anforderungen an die Online-Bürgerbeteiligung erfragt. Die mehr als 1.100 ausgewerteten Antworten belegen einen hohen Bekanntheits- und auch Nutzungsgrad insbesondere der E-Petitionen. Die Beteiligungsangebote weiterer Ausschüsse und Gremien werden dagegen nur von einer Minderheit genutzt. Als wichtigste Motive der Beteiligung werden die Wichtigkeit eines Themas sowie die persönliche Betroffenheit genannt. Die Befragung belegt ein insgesamt hohes Interesse an der Online-Bürgerbeteiligung beim Bundestag. Sie erbrachte außerdem mehr als 600 Kommentare und Anregungen, wie entsprechende Angebote aus Sicht der Stakeholder gestaltet bzw. verbessert werden sollten. Der Bericht stellt neben diesen Ergebnissen der Befragung auch die Vorgehensweise sowie soziodemographische Basisdaten der Teilnehmenden dar. Auch wenn die Ergebnisse keinen Anspruch auf Repräsentativität erheben, bieten sie eine Basis, die bisherigen Erfahrungen des Deutschen Bundestages mit Angeboten der Online-Bürgerbeteiligung zu reflektieren. Die Ergebnisse sind auch Bestandteil des TAB-Arbeitsberichtes Nr. 173

    Gesundheits-Apps. Stakeholder Panel TA

    Get PDF
    Der vorliegende Stakeholder Panel Report stellt die Ergebnisse der Onlinebefragung »Gesundheit-Apps« vor, die vom 13. September 2016 bis 31. Dezember 2016 über das Internetangebot des Stakeholder Panel TA öffentlich zugänglich war. Er ergänzt den TAB-Arbeitsbericht Nr. 179 »Gesundheits-Apps. Innovationsanalyse«, in dem eine Zusammenfassung der Onlinebefragungsergebnisse enthalten ist, und dokumentiert u. a. die Auswertungen der Kommentare von Befragungsteilnehmenden. Der Report führt die Publikationsreihe zum Stakeholder Panel TA fort.

    Phenotypic and molecular insights into CASK-related disorders in males

    Get PDF
    Background: Heterozygous loss-of-function mutations in the X-linked CASK gene cause progressive microcephaly with pontine and cerebellar hypoplasia (MICPCH) and severe intellectual disability (ID) in females. Different CASK mutations have also been reported in males. The associated phenotypes range from nonsyndromic ID to Ohtahara syndrome with cerebellar hypoplasia. However, the phenotypic spectrum in males has not been systematically evaluated to date. Methods: We identified a CASK alteration in 8 novel unrelated male patients by targeted Sanger sequencing, copy number analysis (MLPA and/or FISH) and array CGH. CASK transcripts were investigated by RT-PCR followed by sequencing. Immunoblotting was used to detect CASK protein in patient-derived cells. The clinical phenotype and natural history of the 8 patients and 28 CASK-mutation positive males reported previously were reviewed and correlated with available molecular data. Results: CASK alterations include one nonsense mutation, one 5-bp deletion, one mutation of the start codon, and five partial gene deletions and duplications; seven were de novo, including three somatic mosaicisms, and one was familial. In three subjects, specific mRNA junction fragments indicated in tandem duplication of CASK exons disrupting the integrity of the gene. The 5-bp deletion resulted in multiple aberrant CASK mRNAs. In fibroblasts from patients with a CASK loss-of-function mutation, no CASK protein could be detected. Individuals who are mosaic for a severe CASK mutation or carry a hypomorphic mutation still showed detectable amount of protein. Conclusions: Based on eight novel patients and all CASK-mutation positive males reported previously three phenotypic groups can be distinguished that represent a clinical continuum: (i) MICPCH with severe epileptic encephalopathy caused by hemizygous loss-of-function mutations, (ii) MICPCH associated with inactivating alterations in the mosaic state or a partly penetrant mutation, and (iii) syndromic/nonsyndromic mild to severe ID with or without nystagmus caused by CASK missense and splice mutations that leave the CASK protein intact but likely alter its function or reduce the amount of normal protein. Our findings facilitate focused testing of the CASK gene and interpreting sequence variants identified by next-generation sequencing in cases with a phenotype resembling either of the three groups
    corecore